[core] Add manifest partition pruning to DV validation in MergingSnapshotProducer#15653
[core] Add manifest partition pruning to DV validation in MergingSnapshotProducer#15653anoopj wants to merge 1 commit intoapache:mainfrom
Conversation
validateAddedDVs currently opens every delete manifest regardless of the conflict detection filter. This PR adds manifest evaluator-based filtering, skipping manifests that can't contain matching partitions. Note that other code paths use ManifestGroup, which already does this filtering. validateAddedDVs() was the only gap where we didn't filter.
b145372 to
0caa787
Compare
|
in similar lines, In ManifestFilterManager#removeDanglingDeletesFor receives DataFiles which have partition info but only stores their paths as strings. This means ManifestFilterManager#canContainDroppedFiles returns true for every delete manifest since there's no partition info to filter against. The existing ManifestFilterManager#delete(F File) does this correctly, it stores both the file AND a PartitionSet. then uses this information in ManifestFileUtil#canContainAny to skip manifests whose partition range doesn't overlap. Please let me know if I am missing something here, happy to contribute if this is a valid improvement. |
|
Your analysis above looks correct to me. This is an existing issue in ManifestFilterManager. T he fix would be straightforward: store a |
|
raised #15671 for the same |
validateAddedDVs()currently opens every delete manifest regardless of the conflict detection filter. This PR adds manifest evaluator-based filtering, skipping manifests that can't contain matching partitions. This should speed up commits.Note that other code paths in the snapshot producer uses
ManifestGroup, which already does this filtering. validateAddedDVs() was the only gap where we didn't filter.